TNO at TDT2001: Language Model-Based Topic Detection
نویسندگان
چکیده
Topic detection is concerned with the unsupervised clustering of news stories over time. The TNO topic detection system is based on a language modeling approach. For the grouping of stories we combined a simple single pass method to establish an initial clustering and a reallocation method to stabilize the clusters within a certain allowed deferral period. The similarity of an incoming story to an existing cluster is defined as the average of the similarities of to each story . These individual similarities are computed by taking the sum of the generative probabilities and where and are modeled as unigram language models. Because these story language models are based on extremely sparse statistics, the word probabilities are smoothed using a background model.
منابع مشابه
The Limsi Topic Tracking System for Tdt2001
In this paper we describe the LIMSI topic tracking system used for the DARPA 2001 Topic Detection and Tracking evaluation (TDT2001). The system relies on a unigram topic model, where the score for an incoming document is the normalized likelihood ratio of the topic model and a general English model. In order to compensate for the very small amount of training data for each topic, document expan...
متن کاملUsing language models for tracking events of interest over time
This paper presents the TNO tracking system which was evaluated at the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track events of interest over time. We built a baseline tracking system based on a language modeling approach. This approach had proved to be powerful for the TREC adaptive filtering task and several other IR tasks.
متن کاملUnsupervised Event Clustering in Multilingual News Streams
Abstract The Topic Detection and Tracking (TDT) benchmark evaluation project embraces a variety of technical challenges for information retrieval research. The TDT topic detection task is concerned with the unsupervised grouping of news stories according to the events they discuss. A detection system must both discover new events as the incoming stories are processed and associate incoming stor...
متن کاملDescription of Ntu Approach to Link Detection Task in Tdt2001
We participated in the link detection task and submitted four runs, including both manual and ASR transcription for audio resources; and both English translation and original Chinese character source stream for Mandarin sources. This paper will propose a method to tell if a pair of news stories discusses the same topic. Several issues are addressed, e.g., how to represent a news story, how to m...
متن کاملA Language Modeling Approach to Tracking News Events
This paper presents the TNO tracking system for the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track events of interest over time. Being a first year participant to the TDT project, our original goal for this year was to build a baseline tracking system based on a language modeling approach. This approach had proved to be powerfu...
متن کامل